Bioinformatics analysis identifies potential biomarkers involved in the metastasis of locoregionally advanced nasopharyngeal carcinoma

Nasopharyngeal carcinoma (NPC) is one of the malignant epithelial tumors with a high metastasis rate. This study aimed to screen potential novel biomarkers involved in NPC metastasis. Microarray data of locoregionally advanced NPC (LA-NPC; GSE103611) were obtained from the database of Gene Expression Omnibus. The differentially expressed genes (DEGs) between LA-NPC tissues with and without distant metastasis after radical treatment were screened. Functional analysis was performed and the protein–protein interaction and submodule were analyzed. The univariate Cox regression analysis was performed to identify prognostic genes in NPC in the validation microarray dataset GSE102349. The drug–gene interactions and key genes were identified. Totally, 107 DEGs were identified. The upregulated DEGs and the key nodes in the protein–protein interaction network were associated with pathways or biological processes related to the cell cycle. Four genes including CD44, B2M, PTPN11, and TRIM74 were associated with disease-free survival in NPC. The drug–gene interaction analysis revealed that upregulated genes CXCL10, CD44, B2M, XRCC5, and RPL11 might be potential druggable genes for patients with LA-NPC metastasis by regulating cell cycle, autophagy, and drug resistance. Upregulated CXCL10, CD44, B2M, XRCC5, and RPL11 might play important roles in LA-NPC metastasis by regulating cell cycle-related pathways.


Introduction
Nasopharyngeal carcinoma (NPC) is one of the malignant epithelial tumors arising from the nasopharynx epithelium with a high incidence. [1,2] The global morbidity and mortality of NPC is about 133,000 and 80,000 cases in 2020, respectively. [1] NPC is an Epstein-Barr virus-associated malignancy and has a characterized geographical distribution. [2,3] Over 70% of cases are in the east and southeast parts of Asia. [2,3] Advanced diagnostic tools, sophisticated surgical resection, and conventional pharmacological treatments confer effective prevention and treatment for NPC. [4] Notably, despite a large proportion of patients with locoregionally advanced NPC (LA-NPC) respond well to the above therapeutic methods, patients with secondary or radiation-induced diseases still have an unsatisfactory prognosis and poor survival due to high rates of recurrence and metastasis. [2,5,6] Therefore, it is important to identify novel markers and therapeutic interventions for LA-NPC metastasis.
NPC is characterized by multiple histopathological appearances and complex etiopathogenesis. [7] Cervical lymph node metastasis is the major cause of NPC-related deaths. [7] Numerous characteristic markers of NPC metastasis have been identified by bioinformatics analysis with the revolution of high throughout sequencing technologies, which facilitate early recognition and targeted treatment interventions for NPC metastasis. Using genome-wide methylation microarray data, Ren et al [8] identified aberrantly methylated NPC-specific transcription factors, RH and XX gave equal contribution to this work.
The authors have no funding and conflicts of interest to disclose.
The datasets generated during and/or analyzed during the current study are publicly available. The original microarray dataset are available at the National Center of Biotechnology Information Gene Expression Omnibus database (https://www.ncbi. nlm.nih.gov/geo/) with the accession numbers GSE103611 and GSE102349. All data generated or analyzed during this study are included in this published article. Medicine including the most significantly hypermethylated HOP homeobox HOPX, which promoted NPC metastasis and resulted in poor clinical outcomes. In addition, several research studies showed that genes including matrix metalloproteinase 13, [9] cyclooxygenase-2, [10] flotillin-1, [11] and serine protease inhibitor Kazal-type 6 [12] were associated with NPC metastasis. Some of them were associated with the clinical prognosis of NPC. [8,[12][13][14] A recent study based on microarray analysis by Tang and his colleagues [15] established a reliable prognostic 13-gene signature to predict distant metastasis of LA-NPC. The recent biomarkers were limited to improving the diagnosis and clinical therapy of NPC, and further work is still urgently needed to identify potential biomarkers for NPC metastasis, recurrence, and prognosis.
This study aimed to identify potential marker genes associated with LA-NPC metastasis. The microarray data of LA-NPC was used to identify differentially expressed genes (DEGs) between samples from LA-NPC patients with and without distant metastasis. Meanwhile, potential prognostic and druggable DEGs in NPC were identified. This study might provide additional information on the metastasis, development, and treatment intervention of LA-NPC.

Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors, therefore the ethical approval was not applicable.

Data acquisition
Gene expression microarray dataset GSE103611 was downloaded from the National Center of Biotechnology Information Gene Expression Omnibus database (GEO, http://www.ncbi. nlm.nih.gov/geo/). [16] This dataset is based on the platform of GPL19251 (HuGene-2_0-st) Affymetrix Human Gene 2.0 ST Array (probe set [exon] version). Twenty-four LA-NPC tumor tissues with distant metastasis after radical treatment (metastasis group) and 24 paired LA-NPC tissue specimens without distant metastasis (nonmetastasis group) were included in this dataset.

Data preprocessing and DEGs screening
Data preprocessing was performed using the Oligo package (version 1.36.1, http://bioconductor.org/packages/release/bioc/ html/oligo.html) of R software, which mainly consisted of background correction, normalization, and expression calculation. Probe that was not mapped to any gene symbol was removed from our analysis. When, however, 1 gene was mapped by multiple probes, the mean value of the probe was considered as the expression value of this gene. The classical Bayes method provided by the Limma package [17] was used to screen DEGs between the metastasis and nonmetastasis tumors. Significant DEGs were identified according to the criteria of P value of <.05 and |log 2 (fold change)| >.5.

Gene Ontology (GO) and KEGG pathway enrichment analyses
To assess functions and significantly enriched pathways of DEGs, GO functional annotation of the biological process [18] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis [19] were conducted with clusterProfiler. [20] The thresholds for significantly enriched biological processes and KEGG pathways were P value < .05 and count ≥ 2.

Protein-protein interaction network construction and submodule analysis
The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (version: 10.0, http://www.string-db.org/) [21] that provides the protein-protein interaction (PPI) prediction function is online available. Therefore, we conducted the PPI analysis of DEGs based on this database using the PPI score of 0.15 and expected to identify crucial protein pairs. Afterward, the PPI network was visualized by the Cytoscape software (version 3.2.0, http://www.cytoscape.org/). [22] Moreover, the molecular complex detection (MCODE, version 1.4.2, http://apps. cytoscape.org/apps/MCODE) [23] plugin of Cytoscape was used to analyze submodules with similar functions in the original PPI network. Functional enrichment analysis of GO biological processes and KEGG pathways associated with DEGs were performed using clusterProfiler [20] to investigate the functions of submodules. The top 10 nodes with high degrees in the PPI network and all nodes included in submodules were regarded as hub genes for further analyses.

Screening of validation microarray datasets
To validate the prognostic values and expression patterns of hub genes in NPC tumors, the microarray dataset GSE102349 was downloaded from the GEO database. The microarray dataset was selected according to the following criteria: included of survival data (overall or disease-free survival [DFS] data); included clinical stage; the number of patients was not <20. The GSE102349 dataset (GPL11154, Illumina HiSeq 2000 [Homo sapiens]) included data on DFS, clinical stage, and morphology (differentiated, mixed [round and spindle], and undifferentiated) from 133 patients with NPCs.

Prediction of drug-gene interactions
The Drug-Gene Interaction Database (DGIdb, version 4.2.0; http://www.dgidb.org/) [24] is used to mine existing resources and generate assumptions about how genes are therapeutically targeted or prioritized for drug development. Only FDA-approved drugs were selected to predict drug-gene interactions for hub genes in the PPI network and submodules. Meanwhile, the drug-gene interaction network was constructed and visualized using the Cytoscape software. [22] 2.8. Literature searching for key genes GeneCLiP2.0 (http://ci.smu.edu.cn/), [25] a web-based service platform based on literature mining, has 4 major functional modules, including gene function annotation, molecular network construction, GO, and pathway analysis. Literature searching was performed for the druggable genes by Gene Cluster with Literature Profiles. P value <.05 and count ≥ 3 were set as thresholds for significant items.

Statistical analysis
The association of hub genes (top 10 nodes in the PPI network and nodes in submodules) with the DFS of NPC was validated in the dataset GSE102349. The univariate Cox regression analysis was performed to investigate the association of hub genes with the DFS of NPC in the GSE102349 dataset. Differences in the expression levels of hub genes across different clinical stages (I/II, III, and IV) and morphologies (differentiated, mixed, and undifferentiated) were analyzed using the nonparametric Kruskal-Wallis H test with Dunn multiple comparisons test. All statistical analyses were carried out using the SPSS software (version 22.0; IBM, Armonk, NY) or the Graphpad Prism software (version 8.3.0; GraphPad software, San Diego, CA). Statistical differences were defined when P value was <.05.

Identification of DEGs
Totally, 107 DEGs between LA-NPC tissue specimens with distant metastasis after radical treatment and LA-NPC tissue specimens without distant metastasis, including 98 upregulated genes and 9 downregulated genes were screened according to the methods described above (Fig. 1).

Functional enrichment analysis of DEGs
Functional enrichment analysis showed that upregulated DEGs were enriched in 5 KEGG pathways, including "pathogenic Escherichia coli infection," "arachidonic acid metabolism," "adherens junction," and "EBV infection" (Fig. 2A). GO analysis indicated that upregulated DEGs were enriched in 272 GO biological processes. The top 10 biological processes are shown in Fig. 2B, such as "RNA splicing," "regulation of intrinsic apoptotic signaling pathway," "mRNA processing," and "spliceosomal complex assembly." However, downregulated DEGs were not enriched in any KEGG pathways and GO biological processes.

PPI network analysis
Overall, 66 nodes and 232 protein interaction pairs were included in the PPI network (Fig. 3A). The degree top 10 nodes are exhibited in Table 1 and were used as hub genes, which mainly contained CD44, SRSF6, RPL11, SMG1, HIST1H4D, PTGES3, SFPQ, XRCC5, SNAI2, and HNRNPH3. These  Figure 3B, inclusive of "telomere maintenance," "telomere organization," and "anatomical structure homeostasis." Furthermore, the PPI network included 2 submodules (Fig. 3A). Specifically, module-a (score = 4.4) contained 6 upregulated genes, which were mainly associated with RNA splicing, ribonucleoprotein complex assembly, and ribonucleoprotein  Table 1 Top 10 genes in the protein-protein interaction network and genes in the submodules (hub gene list).

The expression patterns of hub genes in NPC tumors with different clinical stages and morphologies
Twenty nonoverlapping DEGs, including the top 10 nodes in the PPI network and all nodes in the 2 submodules, were regarded as hub genes ( Table 1). The GSE102349 microarray validation showed that only 4 and 5 of the 20 hub DEGs were differentially expressed in tumors in different clinical stages and morphologies, respectively (nonparametric Kruskal-Wallis H test, P < .05; Fig. 5A, B). Statistical analysis showed that genes LUC7L3, CXCL10, and XRCC5 were upregulated in advanced NPC tumors (III or IV) compared with early-stage tumors (I/II), while the B2M gene was downregulated in advanced NPC tumors (P < .05; Fig. 5A). Also, genes PTPN11, XRCC5, and SNAI2 were downregulated in mixed (round and spindle) and/or undifferentiated tumors compared with differentiated tumors (P < .05; Fig. 5B), while genes HIST3H2BB and B2M were upregulated in mixed and/or undifferentiated tumors (P < .05; Fig. 5B).

Screening of prognostic genes in NPC by univariate Cox regression analysis
The univariate Cox regression analysis was performed to screen factors and hub genes associated with DFS in NPC patients in the GSE102349 dataset. The results showed that stromal tumor-infiltrating lymphocytes (TILs) percent and the expression levels of CD44, B2M, PTPN11, and TRIM74 were associated with DFS in NPC patients (

Discussion
Recently, extensive studies have concentrated on illuminating NPC pathogenesis and the identification of biomarkers for NPC via bioinformatics analyses. [26,27] The present study identified 107 DEGs between LA-NPC tissues with and without distant metastasis after radical treatment. The upregulated DEGs were mainly enriched in cell cycle-related pathways. Hub genes in the PPI network, including CD44, SRSF6, RPL11, SMG1, HIST1H4D, PTGES3, SFPQ, XRCC5, SNAI2, and HNRNPH3, were also associated with cell cycle-related pathways, including telomere maintenance and organization. Cox regression analysis showed  Morphology is categorized into, differentiated, mixed (round and spindle), and undifferentiated. CI = confidence interval, HR = hazard ratio, TIL = tumor-infiltrating lymphocytes. www.md-journal.com that the percent of stromal TILs and the expression levels of CD44, B2M, PTPN11, and TRIM74 were associated with DFS in NPC patients. Genes including CXCL10, B2M, XRCC5, and PTPN11, were differentially expressed in NPC tumors of different clinical stages and morphologies. Genes of CXCL10, CD44, B2M, XRCC5, and RPL11 might be potentially druggable genes for patients with LA-NPC metastasis by regulating various biological functions, including cell cycle, cell proliferation, drug resistance, and autophagy. Cancer is characterized by the deregulation of cell cycle controls, which leads to the uncontrolled growth of tumor cells. [28] This study found that most of the upregulated DEGs in metastatic NPC tumors were mainly involved in biological processes and pathways related to the cell cycle, such as RNA splicing, mRNA processing, telomere maintenance, and regulation of chromosome organization. As the basic biological process, the cell cycle is defined as the process from the end of a cell division to the end of the next cell division and consists of 2 main events: DNA synthesis and cell division. There are 2 major mechanisms for the regulation of the tumor cell cycle, including cell cycle-driven mechanisms and monitoring mechanisms. [28] The deregulation of cell cycle mechanisms can contribute to the uncontrolled growth of normal cells and conversion to tumor cells. [28] Meanwhile, the uncontrolled cell cycle monitoring mechanisms can also occur in cell proliferation, DNA repair, cell death, etc, thereby causing genomic instability. [28] It is well known that genomic instability is mainly manifested as DNA mutations, deletions, or ectopic, as well as chromosome malformations and aneuploidy, which can result in a burst of mutated genes related to cell cycle regulation. [28] Thus, the key regulators of the cell cycle are promising targets for cancer therapy. [29] There is plenty of research studies that have proved the efficiency of cell cycle-targeted cancer therapy recently. [30] In this study, twenty DEGs including CXCL10, CD44, B2M, XRCC5, PTPN11, and RPL11 were identified as hub genes for LA-NPC metastasis. The protein tyrosine phosphatase nonreceptor type 11 (PTPN11) gene encodes the SHP-2 phosphatase, which participates in oncogenic events by regulating the expression of genes controlling cell cycle progression, cell proliferation, oncogene-induced senescence, and drug resistance. [31,32] CXC chemokine ligand 10 (CXCL10) is a member of the chemokine family secreted from various cell types. [33] Of note, a previous study has demonstrated that CXCL10 can cause cell cycle redistribution by prolonging G1 and shortening the S phase in cancer cells. [34] Wightman et al [35] showed that coexpression of CXCL10 and CXCR3 predicted an increased risk of metastasis and was associated with poor overall survival and early metastasis. Cluster of differentiation 44 (CD44) is a cell adhesion molecule that is thought as a potential diagnostic tumor marker. [36] The upregulation of CD44 can promote the proliferation and metastasis of tumor cells, while its downregulation inhibits cell cycle progression. [36] A meta-analysis showed that total CD44 isoforms overexpression was correlated with worse overall survival of patients with colorectal cancer, and CD44 expression was related to distant metastasis and poor differentiation. [37] X-ray repair cross-complementing 5 (XRCC5) is known as a DNA repair protein that exerts critical effects on genomic stability and tumorigenesis. [38] Ribosomal protein L11 (RPL11) is essential for cell viability and mutations in the RPL11 gene may contribute to cancer pathogenesis. [39,40] Beta-2-microglobulin (B2M) is a major histocompatibility complex class I molecule that displays antibacterial activity. B2M overexpression is correlated with a poor prognosis in several human cancers, including pancreatic ductal adenocarcinoma [41][42][43] and it has been considered as a potential target for cancer therapy. [41,42] Consistently, we found that CXCL10, CD44, B2M, XRCC5, PTPN11, and RPL11 were upregulated in the LA-NPC tumors with metastasis compared with tumors without metastasis, and genes CD44, B2M, PTPN11, and RPL11 were associated with the DFS in NPC patients in the GSE102349 dataset.
A study by Mi et al [26] identified 10 molecules, including etomidate, sanguinarine, verteporfin, and chrysin, were associated with NPC metastasis, and are potential drugs for the prevention and treatment of metastasis. However, few clinical trials have been reported for these drugs in NPC. The drug-gene interaction analysis in this study indicated that upregulated genes CXCL10, CD44, B2M, XRCC5, and RPL11 were potential druggable genes that were targeted by >1 FDA-approved drug. For instance, the CXCL10 gene is targeted by 9 molecules, including zidovudine, oxaliplatin, and methylprednisolone, and the B2M gene is targeted by 3 drugs including amikacin, pembrolizumab, and thyroglobulin. Drugs including pembrolizumab, oxaliplatin, methylprednisolone, hydrogen peroxide, and amikacin, have been used in the clinical treatment of NPC. [44][45][46][47][48] These results showed that these genes might be used as promising targets for NPC therapy or metastasis.
The limitation of this study is the lack of validation experiments for expression, metastatic/prognostic value, and druggability of genes, including CXCL10, CD44, B2M, XRCC5, PTPN11, and RPL11. The association of CD44, B2M, PTPN11, and TRIM74 with the DFS of NPC patients was preliminarily evaluated in the GSE102349 dataset. However, validation using microarray datasets or clinical trials reporting the prognostic values of the hub genes, or experiments confirming their therapeutic value might provide more evidence for the results of this study.

Conclusions
In conclusion, 107 DEGs between LA-NPC tissue specimens with and without distant metastasis after radical treatment were identified in the dataset of GSE103611. Upregulated expression of CXCL10, CD44, B2M, XRCC5, and RPL11 in LA-NPC tissues after radiotherapy treatment may promote metastasis. The univariate Cox regression analysis showed that the percent of stromal TILs and the expression levels of CD44, B2M, PTPN11, and TRIM74 were associated with DFS in NPC patients. Five DEGs, including CXCL10, CD44, B2M, XRCC5, and RPL11 might be potential druggable genes for patients with LA-NPC metastasis by regulating cell cycle-related pathways. Monitoring the expression profiles of these genes may provide a reference for clinical practice.